How do I deal with content scrapers? [closed]
Posted
by
aem
on Pro Webmasters
See other posts from Pro Webmasters
or by aem
Published on 2012-04-04T02:45:47Z
Indexed on
2012/04/04
11:41 UTC
Read the original article
Hit count: 282
Possible Duplicate:
How to protect SHTML pages from crawlers/spiders/scrapers?
My Heroku (Bamboo) app has been getting a bunch of hits from a scraper identifying itself as GSLFBot. Googling for that name produces various results of people who've concluded that it doesn't respect robots.txt (eg, http://www.0sw.com/archives/96).
I'm considering updating my app to have a list of banned user-agents, and serving all requests from those user-agents a 400 or similar and adding GSLFBot to that list. Is that an effective technique, and if not what should I do instead?
(As a side note, it seems weird to have an abusive scraper with a distinctive user-agent.)
© Pro Webmasters or respective owner